Goto

Collaborating Authors

 post-training quantization


2SQ-VDiT: Accurate Quantized Video Diffusion Transformer with Salient Data and Sparse Token Distillation

Neural Information Processing Systems

To address these issues, we propose S2Q-VDiT, a posttraining quantization framework for V-DMs that leverages Salient data and Sparse token distillation. During the calibration phase, we identify that quantization performance is highly sensitive to the choice of calibration data. To mitigate this, we introduce Hessian-aware Salient Data Selection, which constructs high-quality calibration datasets by considering both diffusion and quantization characteristics unique to V-DMs. To tackle the learning challenges, we further analyze the sparse attention patterns inherent in V-DMs.



PTQ4DiT: Post-training Quantization for Diffusion Transformers

Neural Information Processing Systems

The recent introduction of Diffusion Transformers (DiTs) has demonstrated exceptional capabilities in image generation by using a different backbone architecture, departing from traditional U-Nets and embracing the scalable nature of transformers. Despite their advanced capabilities, the wide deployment of DiTs, particularly for real-time applications, is currently hampered by considerable computational demands at the inference stage. Post-training Quantization (PTQ) has emerged as a fast and data-efficient solution that can significantly reduce computation and memory footprint by using low-bit weights and activations. However, its applicability to DiTs has not yet been explored and faces non-trivial difficulties due to the unique design of DiTs. In this paper, we propose PTQ4DiT, a specifically designed PTQ method for DiTs.


Q-VLM: Post-training Quantization for Large Vision-Language Models

Neural Information Processing Systems

In this paper, we propose a post-training quantization framework of large vision-language models (L VLMs) for efficient multi-modal inference. Conventional quantization methods sequentially search the layer-wise rounding functions by minimizing activation discretization errors, which fails to acquire optimal quantization strategy without considering cross-layer dependency.






SupplementaryMaterial

Neural Information Processing Systems

The results ofAddNN are basically consistent with the results in [1]. The accuracy drops after post-training quantization are reported in Table4.